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(54) Wireless communication system and method having a space-time architecture, and receiver 
for muiti-user detection 



(57) A threaded space-time (TST) architecture in a 
multiple antenna wireless communication system uses 
the coded transmission in each layer of a transmission 
resource array as a space-time code. Each layer of a 
layer set is active during all available symbol transmis- 
sion intervals, and each of the transmit antennas are 
used equally often, such that layers each transmit a 



symbol using a different antenna during each symbol 
transmission interval. A receiver is provided for multi- 
user reception using an iterative, soft-input/soft-output 
(SISO) multi-user detection algorithm based on mini- 
mum mean square error (MMSE) criterion, among other 
methods. 
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Description 

[0001] This application claims the benefit of provisional U.S. application Serial No. 60/143,293, filed July 12, 1999. 

5 Cross Reference to Related Applications 

[0002] Related subject matter is disclosed in U.S. patent application Serial No. 09/397,896, filed September 17, 
1 999, and U.S patent application of A. Roger Hammons et al for "Method of Generating Space-Time Codes for Gener- 
alized Layered Space-Time Architectures", filed even date herewith (Attorney's docket PD-9900238), the entire con- 
w tents of both of said applications being expressly incorporated herein by reference. 

Field of the Invention 

[0003] The invention relates generally to a method of symbol transmission employing space-time codes in a multi- 
15 pie antenna wireless communication system. The invention also relates to a method and apparatus for space-time sig- 
nal processing and multi-user detection and decoding in a multiple antenna wireless communication system. 

Background of the Invention 

20 [0004] Unlike the Gaussian channel, the wireless channel suffers from multi-path fading. In such fading environ- 
ments, reliable communication is made possible only through the use of diversity techniques in which the receiver is 
afforded multiple replicas of the transmitted signal under varying channel conditions. Recently, information theoretic 
studies have shown that spatial diversity provided by multiple transmit and/or receive antennas allows for a significant 
increase in the capacity of wireless communication systems operated in a Rayleigh fading environment. Following this 

25 research, two approaches for exploiting this spatial diversity have been proposed. 

[0005] In accordance with one approach, channel coding is performed across the spatial dimension, as well as 
time, to benefit from the spatial diversity provided by using multiple transmit antennas. Accordingly, the term "space- 
time codes" is used in connection with this coding scheme. One potential drawback of this scheme is that the complexity 
of the maximum likelihood (ML) decoder is exponential in the number of transmit antennas. 

30 [0006] A second approach relies on complex signal processing techniques at the receiver to achieve performance 
asymptotically close to the outage capacity. In this approach, no effort is made to optimize the channel coding scheme. 
Conventional single-dimensional channel codes are used to minimize complexity. This approach is referred to as the 
layered space-time (LST) architecture. The LST architecture involves formulating the problem as a multi-user detection 
problem at the receiver and, hence, capitalizing on existing multi-user detection techniques in the receiver design. A 

35 proposed algorithm is based on a combination of decision feedback interference cancellation and zero-forcing interfer- 
ence avoidance. One drawback of the LST architecture is that the number of receive antennas must be at least equal 
to the number of transmit antennas. The LST signal processing does not gain the maximum diversity advantage that 
space-time coding offers. At low signat-to-noise ratios, this approach may surfer from error propagation resulting from 
the decision feedback cancellation. 

40 

Summary of the Invention 

[0007] In accordance with the present invention, novel solutions to problems associated with designing multiple 
antenna wireless systems are presented. 
45 [0008] In accordance with an aspect of the present invention, a receiver is provided for multi-user reception. The 
receiver provides for joint detection and decoding. 

[0009] In accordance with another aspect of the present invention, a set of lower complexity reception techniques 
based on the turbo processing architecture is presented. These techniques provide a trade-off between complexity and 
performance. Joint detection and decoding algorithms based on the iterative soft-input-soft-output (SISO) approaches 
so are provided. These algorithms avoid the limitations of the LST signal processing techniques, including the need for 
equal number of transmit and receive antennas. 

[0010] In accordance with yet another aspect of the present invention, a transmitter employs space-time coding to 
improve the efficiency of multiple antenna systems. A general architecture that combines efficient algebraic code 
design with advanced signal processing techniques is employed and is referred to as the threaded space-time (TST) 
55 architecture. The TST architecture also allows for exploiting the temporal diversity provided by the time varying fading 
channel. The existing scheme for combined array processing and space-time coding described above, which likewise 
addresses some of the problems encountered with LST, relies upon a zero forcing group interference suppression tech- 
nique and shows performance that is 6 - 9 dB from the outage capacity. The TST architecture and signal processing of 
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the present invention, however, improves performance to less than 3 dB from the outage capacity. It also provides 
greater flexibility in terms of the trade-off between power efficiency, bandwidth efficiency, and receiver complexity. 

Brief Description of the Drawings 

[0011] The various aspects, advantages and novel features of the present invention will be more readily compre- 
hended from the following detailed description when read in conjunction with the appended drawings, in which. 

Figure 1 is a block diagram of a multiple antenna wireless communication system constructed in accordance with 
an embodiment of the present invention; 

Figure 2 illustrates a code word matrix encoded and transmitted in accordance with a known layered space-time 
architecture; 

Figure 3 illustrates space-time codes transmitted in accordance with a known multi-layered space-time architec- 
ture; 

Figure 4 is a block diagram of a receiver constructed in accordance with an embodiment of the present invention; 

Figure 5 is a block diagram of a receiver constructed in accordance with an embodiment of the present invention; 

Figures 6 illsutrates a threaded code word matrix constructed using a threaded space-time architecture in accord- 
ance with an embodiment of the present invention; 

Figures 7 and 8 are graphs illustrating the performance of a receiver constructed in accordance with an embodi- 
ment of the present invention; and 

Figures 9, 10 and 1 1 are graphs illustrating the performance of a threaded space-time architecture implemented in 
accordance with an embodiment of the present invention. 

[001 2] Throughout the drawing figures, like reference numerals will be understood to refer to like parts and compo- 
nents. 

Detailed Description of the Preferred Embodiments 

[0013] The description below shall be organized as follows: the system description and a brief review of previous 
work on the design of space-time modems are presented in Section 1. In Section 2, the optimal receiver for joint detec- 
tion and decoding is identified, and a set of iterative receivers that provide a trade-off between complexity and perform- 
ance is presented. The application of iterative receivers to the layered space-time architecture is discussed in Section 
2.3. In Section 3, a novel approach for joint space-time transmitter/receiver design is presented that combines efficient 
multi-user detection with space-time coding. Algebraic space-time code constructions for the new architecture are pro- 
vided in Section 3.2. Comparisons of the various layered architectures in terms of efficiency and achievable diversity 
order are presented in Section 4, while simulation results are compared in Section 5. Finally, Section 6 presents con- 
clusions. 

1 . Overview of Space-Time Concepts 

[0014] In this section, the basic concepts for space-time signal design and signal processing are described. Impor- 
tant concepts involved in space-time codes, that is, layered space-time processing; and another proposed hybrid multi- 
layered approach, are briefly explained. 

1.1 Signal Model 

[0015] A multiple antenna communication system 10 with n transmit antennas 18 and m receive antennas 14 as 
shown in Figure 1. In this system 10, the channel encoder 20 in the transmitter 14 accepts input from an information 
source 12 and outputs a coded stream of higher redundancy suitable for error correction processing at the receiver 16. 
The encoded output stream is modulated via a spatial modulator 22 and distributed among the n antennas 18. The 
transmissions from each of the n transmit antennas 1 8 are simultaneous and synchronous. The signal received at each 
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antenna 24 is therefore a superposition of the n transmitted signals corrupted by additive white Gaussian noise and 
multiplicative fading. The signal is processed by a demodulator 26 and a decoder 28 and provided to an information sink 
30. 

[0016] Assume that the transmitter 14 is capable of an aggregate transmission rate of nR s symbols per second 
5 (i.e., a transmission rate of R s symbols per second per transmit antenna). Then, over a transmission time of T seconds, 
the transmitter 14 may transmit up to I = H S T channel symbols per antenna. The space-time transmission resources 
may therefore be viewed as an n x t array whose (/, f)-th entry represents the Nth symbol interval available on the /-th 
antenna. The dimension indexed by / is referred to as the spatial dimension, whereas the dimension indexed by t is 
called the temporal dimension. 
w [0017] In Figure 1 , the channel encoder 20 is a generic function and, in many cases of interest, can be decompos- 
able into a set of multiple, independent channel encoders processing separate substreams from the information source 
1 2. When the channel encoder 20 is decomposable, there is a corresponding partitioning of the spatial modulating func- 
tion that is of interest. The components of such a partitioning are referred to as layers or multi-layers. 
[0018] At the receiver 1 6, the signal r{ received by antenna j at time f is given by 

15 

ml 

20 

where VE^ is the energy per transmitted symbol; a\'fi is the complex path gain from transmit antenna / to receive 
antenna / at time t; c{ is the symbol transmitted from antenna / at time f; n{ is the additive white Gaussian noise sam- 
ple for receive antenna j at time f. The noise samples are independent samples of zero-mean complex Gaussian ran- 
dom variable with variance A/q/2 per dimension. The different path gains ccj'/) are assumed to be statistically 
25 independent. The fading model of primary interest is that of a block flat Rayleigh fading process in which the code word 
encompasses B fading blocks. The complex fading gains are constant over one fading block but are independent from 
block to block. The quasi-static fading model has been studied which is a special case of the block fading model in which 
8=1. 

[0019] The received signal can be expressed in vector notation as 

30 

£ = + & (2) 



35 where r t is the m x 1 received vector at time f; S T is the m x n complex signature matrix whose f h column corresponds 
to the path gains for the P* antenna; ^ is the n x 1 transmitted vector at time t; n x is the m x 1 white Gaussian noise 
vector. 

[0020] The system 10 provides not one, but nm, communication links between sender and receiver, corresponding 
to each distinct transmit/receive antenna pairing. The objective of space-time system design is to use these statistically 
40 independent, but mutually interfering, communication links to increase system throughput and quality of service by 
exploiting the spatial and temporal diversity available in the system. 

1.2 Space-Time Channel Codes 

45 [0021] For space-time channel code design, assume that the channel encoder 20 of Figure 1 is indecomposable. 
The primary design objective is therefore to provide channel codes that exploit the full transmission resource array and 
provide the highest level of spatial diversity at the receiver 16. 

[0022] In the concept of a space-time code, the channel encoding, modulation, and distribution of symbols across 
antennas are intrinsically connected. Given a set X, the space of 1 x m row vectors and the space of n x m matrices 
so taking values in X will be denoted by X™ and X" xm , respectively. Then, a block code of length N over the discrete sym- 
bol alphabet y is a subset C of the /V-dimensional space y N . Usually, the number of code words in C is a power of 
the alphabet size, 

\c\ = \y\ k , 



so that there is a one-to-one mapping, y : y k -> C, of information /c-tuples onto code words. The mapping y is an 
encoder for C. In this paper, we will be primarily interested in the case in which C is a binary linear code-i.e., y is the 



elementary binary field 
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[0023] The baseband modulation mapping u, : y b -» a assigns to each Mupte of alphabet symbols a unique point 
in the discrete, complex-valued signaling constellation Q which is assumed not to contain the point zero. Conversely, 
the inverse map u/ 1 provides a b-symboi labeling of the constellation points. By extension, \x(x) denotes the modulated 
w version of the vector x e y N .\r\ this case, it is understood that N must be a multiple of b and that the blocking of sym- 
bols into Muples for the modulator is performed left to right. 

[0024] Let H + = nu{0) denote the expanded constellation. Then, the spatial modulator is a mapping f : y N -> 
(a+) nxi that sends the vector x to an n x £ complex-valued matrix 



20 



whose non-zero entries are a rearrangement of the entries of n(x). Specifically, c is the baseband version of the code 
word x as transmitted across the channel. Thus, in the notation of equation (1 ), the matrix c has (/, f)-th entry equal to 
c{ . Note that, in this formulation, it is expressly allowed that no symbol be transmitted by a given antenna at a given 
signaling interval; thus, N/b < nl.n and I are referred to, respectively, as the spatial span and temporal span of f. 
[0025] Finally, for convenience, let 



25 



denote the n x bt matrix in which each constellation point is replaced by its b-symbol label and any zero entry is 
replaced by a Muple of special blank symbols. The map a : x c is called the spatial formatter. 
[0026] Definition 1 A space-time code C consists of an underlying channel code C together with the spatial mod- 
30 ulator function f . 

[0027] The fundamental performance parameters for space-time codes are the following; (1) diversity advantage, 
which describes the exponential decrease of decoded error rate versus signal-to-noise ratio (asymptotic slope of the 
performance curve in a log-log scale); and (2) coding advantage which does not affect the asymptotic slope but results 
in a shift in the performance curve. The diversity advantage is the more critical of the two performance metrics as it 
35 determines the asymptotic slope of the performance curve. Ideally, the coding advantage should be optimized after the 
diversity advantage is maximized. 

[0028] For quasi-static fading channels, it has been shown that the spatial diversity advantage of the code, assum- 
ing ML decoding, is the product of the number of receive antennas 24 and the minimum rank among the set of complex 
valued matrices associated with the difference between baseband modulated code words. It is clear that full spatial 
40 diversity nm will be achieved if and only if all the difference matrices have full rank. Based on this design criterion, sim- 
ple design rules have been proposed for space-time trellis codes for 2-level spatial diversity. 

Rule / .Transitions departing from the same state differ only in the second symbol 

45 Rule 2. Transitions merging at the same state differ only in the first symbol. 

When these rules are followed, the code word difference matrices are of the form 



50 



f(jSc) - f(*) = 



Si 
0 



0 

&2 



with 8 1p $2 nonzero complex numbers. Thus, every such difference matrix has full rank, and the space-time code 
55 achieves 2-level spatial diversity. Two good trellis codes that satisfy these design rules, and several others that do not, 
were handcrafted using computer search methods. 

[0029] The fact that this design criterion applies to the complex domain, rather than the discrete domain in which 
the codes are designed, has hindered the development of more general results. The following binary rank criterion for 
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BPSK-modulated, binary space-time codes have also been developed: 

[0030] Theorem 2 (Binary Rank Criterion) Let Cbea linear nxt space-time code with underlying binary code C 
of iength N = n£ where t>n. Suppose that every non-zero code word c is a matrix of full rank over the binary field 
F . Then, for BPSK transmission over the quasi-static fading channel, the space-time code C achieves full spatial diver- 
5 sity nm. 

[0031] Using the binary rank criterion, the following construction for space-rime codes is proposed which is referred 
to as the stacking construction. 

[0032] Theorem 3 (Stacking Construction) Let M 1t M 2 ,..., M n be binary matrices of dimension kxt,t>k, and 
let Cbethenxt space-time code of dimension k consisting of the code word matrices 

w 

20 where x denotes an arbitrary k-tuple of information bits and n<L Then C satisfies the binary rank criterion, and thus, 
for BPSK transmission over the quasi-static fading channel, achieves full spatial diversity nm, if and only if M 1f M 2 ,..., 
M n have the property that 

25 Vai,£2i . - - ,a n 6 F : 

M = Q\Mx © a^Mj $ - • • © a n Mn is of full rank k mless a x = a 2 =*•■- = a„ = G. 



30 

[0033] It is clear that this construction is general for any number of antennas and, generalized in the obvious fash- 
ion, applies to trellis, as well as block codes. This constriction, and a similar version for QPSK transmission (in which 
case 

35 

the integers modulo 4, and 6=1), have been shown to encompass, as special cases, transmit delay diversity, the afore- 
40 mentioned hand-crafted trellis codes, rate Mn convolutional codes, and certain block and concatenated coding 
schemes. The generator polynomials for rate Mn convolutional codes with the best minimum distance that achieve full 
spatial diversity are discussed in the above -referenced patent application Serial No. 09/397,896. 

1.3 Layered Space-Time Architectures 

45 

[0034] In the layered space-time processing approach, the channel encoder 20 of Figure 1 is composite, and the 
multiple, independent coded streams are distributed throughout the transmission resource array in layers. The primary 
design objective is to design the layering architecture and associated signal processing so that the receiver can effi- 
ciently separate the individual layers from one another and can decode each of the layers effectively. In these schemes, 
so there is no spatial interference among symbols transmitted within a layer (unlike the space-time code design approach); 
hence, conventional channel codes can be used while the effects of spatial interference are addressed primarily in the 
signal processor design. 

[0035] Different layering schemes are provided for the proposed Bell Laboratories Layered Space-Time (BLAST) 
architecture. In the simplest variation, the code words are transmitted in horizontal layers. The preferred scheme, how- 
55 ever, involves the transmission of code words in diagonal layers. The notion of a layer is generalized herein as a section 
of the transmission resources array having the property that each symbol Interval within the section is allocated to at 
most one antenna. This property ensures that all spatial interference experienced by the layer comes from outside the 
layer. A layer has the further structural property that a set of spatial and/or temporal cyclic shifts of the layer within the 



15 



c = 
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15 



40 



45 



50 



transmission resource array provides a partitioning of the transmission resource array. This allows for a simple repeated 
use of the layer pattern for transmission of multiple, independent coded streams. 

[0036] Formally, a layer in an n xt transmission resource array maybe identified by an indexing set Lal n xl t hav- 
ing the property that the f-th symbol interval on antenna a belongs to the layer if and only if (a, t) e L. Then, a layer 
requires that, if (a, t) e L and (a', t 1 ) e L, then either t ± for a = a'-i.e., that a is a function of f. 

[0037] Now, consider a composite channel encoder y consisting of n constituent encoders y^ t y 2 Jn operating on 

independent information streams. Let 



y* 1 



so that k = k 
u~u^\u 2 \ 
length 



= + k 2 + • • • + k n and N = + N 2 + • • • + N n . Then, there is a partitioning 
• • • I u n-1 of the composite information vector ue y k into a set of disjoint component vectors u { , of 



and a corresponding partitioning y(i/) = y^L/^ I y 2 (u 2 ) I • • • \y n (u n ) of the composite code word y(u) into a set of 
20 constituent code words 

25 of length 

K. 

30 In the layered architecture approach, the space-time transmitter assigns each of the constituent code words 

35 to one of a set of n disjoint layers. For simpicity, consider the case in which the constituent codes are all of the same 
rate and have the same code word length: 

N % - N/n 

and 

K = kjn 

for all /*. 

[0038] There is a corresponding decomposition of the spatial modulating function that is induced by the layering. 
Let 



denote the component spatial modulating function, associated with layer 

55 
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which agrees with the composite spatial modulator f regarding the modulation and formatting of the layer elements but 
which sets all off-layer elements to complex zero. Then 

f (7(k)) = fl (7l(fc)) + fj (78 (te)) + •••+«. (UUn)) • 



[0039] In the V-BLAST architecture, the transmitter uses n conventional channel encoders and permanently 
10 assigns the output of each encoder to one of the n transmit antennas. This corresponds to a partitioning of the trans- 
mission resource array into the horizontal layers 



Li = {(i,t):0<t<£} t (3) 



where i = N/(nb) . Better performance is achieved by the preferred D-BLAST architecture in which the output of each 
encoder is distributed among the n antennas along the diagonal layers 

20 

U = {(Lt/wJ, - i + 2,0 : (i - l)m < t < I - (n - »)*»} , (4) 

25 where w = N/(n 2 b) is the width of the diagonal, i = (2n - 1) w is the temporal span, and L • J„ denotes the function 
returning the integer part of a real-valued input reduced modulo n. 

[0040] The BLAST receiver uses a multi-user detection strategy based on a combination of interference cancella- 
tion and avoidance. In D-BLAST, each diagonal layer constitutes a complete code word, so decoding is performed layer 
by layer. Consider the code word matrix 32 shown in Figure 2. The entries below the first diagonal layer 34 are zeros. 

30 To decode the first diagonal 34, the receiver generates a soft decision statistic for each entry in that diagonal. In doing 
so, the interference from the upper diagonals is avoided by projecting the received signal onto the null-space of the 
upper interference The soft statistics are then used by the corresponding channel decoder to decode this diagonal. The 
decoder output is then fed back to cancel the first diagonal contribution in the interference while decoding the next diag- 
onal. The receiver then proceeds to decode the next diagonal in the same manner. 

35 [0041] This zero-forcing strategy is only possible if the number of receive antennas m is at least as large as the 
number of transmit antennas n. Zero-forcing also results in a loss in achievable diversity order that depends on the 
number of interferers to be avoided. For example, the symbol in the uppermost position will have the maximum diversity 
order m, whereas the symbol in the lowermost position will have the minimum diversity order 1. Thus, the diagonal lay- 
ering of the encoded stream is necessary to achieve equal performance for all coded streams. Due to the interference 

40 cancellation mechanism, errors can also propagate spatially. 

1 .4 Multi-Layered Space-Time Architectures 

[0042] Multi-layered space-time processing is a hybrid approach involving use of both space-time channel codes 
45 and layered processing, as illustrated in Figure 3. Space-time codes 36a through 36n are used in a conventional man- 
ner; however, the number of antennas is limited to facilitate group processing. Since code words are no longer trans- 
mitted in a single layer, there is spatial interference among transmitted symbols within a given code word that should be 
addressed as part of the channel code design. 

[0043] The group interference suppression technique is proposed. In this scheme, the Input stream is divided, for 
so example, into nlri substreams. The different substreams are encoded using n'-level diversity component trellis codes 
C 1( ... t C n j n <. Each component code is then transmitted from n' antennas (horizontal n'-layering). At the receiver, each 
component code is decoded separately while suppressing signals from other component codes. The group interference 
suppression strategy is based on the zero-forcing principle and requires that m > n - n' + 1 . In quasi-static fading 
channel, the spatial diversity gain achieved by C 1 is n' x (m - n + n') . Assuming correct decoding of C 1( its contribu- 
55 tion is subtracted from signals at different receive antennas. This gives a communication system with n - n' transmit and 
m receive antennas. Hence, the space time code C 2 affords a diversity gain of n' x (m - n + 2n') , and so on. Using 
the fact that the diversity gain increases with each decoding stage, unequal power levels are allocated to the different 
component codes. Because all of the aforementioned space-time codes were 2-level diversity codes, except for the 
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delay diversity, known examples were limited to n' = 2. 

[0044] The performance of this architecture was shown to be within 6 - 9 dB from the outage capacity at frame error 
rate of 10" 1 . 

2. Multi-User Detection for Space-Time Applications 

[0045] In accordance with the present invention, the problem of space-time signal processing is considered to be a 
multi-user detection problem. Iterative multi-user detection algorithms are provided and their advantages over zero-forc- 
ing strategies in layered and multi-layered space-time architectures will be discussed below. 

2.1 Optimal Multi-User Detection 

[0046] Consider the layered space-time architecture in which n binary channel encoders of rate r and constraint 
length v are used, and each encoder output is assigned to a different layer. It is clear that this system is equivalent to a 
synchronous code division multiple access (CDMA) system with n user and m spreading gain, where the complex fad- 
ing coefficients constitute the equivalent spreading sequences. In general, m < n corresponds to an overloaded CDMA 
system. In such a scenario, the optimum receiver for joint detection and decoding combines the trellises of both the 
multi-user detector and the channel decoder. This receiver can be realized using a Viterbi algorithm whose complexity 
is of exponential order Q(2 nv ) in the product of the number of antennas and the code constraint length. For some sys- 
tems, the exponential increase in implementation complexity may make the optimal receiver impractical for even a rel- 
atively small number of antennas. Thus, there is a need for alternate receiver architectures that are less complex but 
still efficient. 

2.2 Iterative Multi-User Detection 

[0047] In this section, the turbo-processing principle is used to derive a set of iterative multi-user detection algo- 
rithms that allow trade-offs to be made between performance and complexity. A block diagram of the iterative receiver 
40 is shown in Figure 4. For simplicity, horizontal layering with binary channel codes (n binary channel encoders cou- 
pled to n transmit antennas) and BPSK modulation are assumed. Extension to nonbinary codes and to the multi-lay- 
ered architecture is straightforward. 

[0048] With reference to Figure 4, a receive signal is processed by a matched filter bank 42, and an estimation 
module 52. A soft-input/soft-output (SISO) multi-user detector module 44 provides joint soft-decision estimates of the n 
streams of data. Each of the detected streams are decoded by the separate SISO channel decoders 48a through 48n 
associated with the component channel codes. The detected streams are deinterleaved, as indicated at 46, prior to 
decoding. The output of the decoder is interleaved again, as indicated at 50a through 50n, to facilitate interleave 
processing by the multi-user detector. After each decoding iteration, the soft outputs from the channel decoders 48a 
through 48n are used to refine the processing performed by the SISO multi-user detector 44. In the iterative receiver 
40, each of the streams is independently interleaved to facilitate convergence. This aspect of the receiver 40 also influ- 
ences channel code design. 

[0049] The SISO channel decoders 48a though 48n can employ any of the following algorithms: (1) the maximum 
a-posteriori (MAP) approach, which is optimal in the sense that it minimizes the probability of bit error at the decoder 
output; (2) the (log-MAP) approach, which is a lower complexity, additive version of the (MAP) rule that operates in the 
log-domain; or (3) the soft output Viterbi algorithm (SOVA). The choice of the decoding technique depends on the avail- 
able processing power at the receiver 40. 

[0050] The overall complexity of the iterative receiver 40 depends primarily on the algorithm used by the multi-user 
detector 44. Therefore, three SISO, multi-user detection algorithms that provide a trade-on between performance and 
complexity are developed. The first is based on the maximum a-posteriori (MAP) probability rule; the second is based 
on the minimum mean square error (MMSE) criterion; and the third can be viewed a suboptimal approximation of the 
iterative MMSE receiver. 

[0051] In all cases, the derivations require an assumption of statistical independence of the spatial soft decision 
information. This assumption is sufficiently satisfied in practice by requiring that the transmissions from each of the 
antennas be independently interleaved. 

2.2.1 Iterative MAP Receiver 

[0052] In this case, the SISO multi-user detector 44 computes the symbol-by-symbol maximum a posteriori (MAP) 
statistics defined by 
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Specifically, the soft decision statistic for 

10 

is updated iteratively via the following rule: 



75 



where 

20 



25 



Pfcilfc) 

is the conditional multivariate complex Gaussian distribution of the received vector; 
is the joint a-priori probability distribution of the transmitted symbols; and 



30 



35 



= {(cj cr x .+l.cr l .-.eT):cj6{-l,l}} (6) 

c = {(4.....<r , ,-l.4 w ..».«?)^e{-i.i}}- (7) 



40 [0053] The computation of the joint distribution 

45 is intractable in general without further assumptions If statistical independence is assumed, then 

p(c], ..,c^^c^^... > c?)-p(c t l )•••p(c;- l )p(4 +i )•-•p( C ^*) & 

50 

In the first iteration, one takes 

P(«-l)-J»(«*«-l)-l/2. 

In subsequent iterations, the a-priori probabilities are re-computed based on the previous iteration's extrinsic informa- 
tion, 
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At', 

5 corresponding to the symbol transmitted from the y'-th antenna at time f: 



w 



20 



45 



50 



[0054] Note that, while the fading is assumed independent for each transmit-receive antenna pair, the extrinsic 
information is generally correlated. The independence assumption is therefore invalidated, unless the separate antenna 
15 transmissions are independently interleaved. When different interleaving is used for each antenna transmission, how- 
ever, the independence assumption is reasonably well approximated. 

[0055] The MAP approach is used for CDMA applications. For the iterative MAP decoder, the number of terms in 
each of the summations is preferably 2 n " 1 . Hence, the complexity of the MAP detector per iteration is O (n2 n ), and the 
overall complexity of the receiver, per iteration, is Q(n [2 n + 2 V ]). 



2.2.2 Iterative MMSE Receiver 



[0056] In this scheme, the SISO multi-user detection module 44 is based on the MMSE criterion. After each decod- 
ing iteration via 48a through 48n, the soft outputs are used to update the a-priori probabilities of the transmitted sym- 
25 bols. These updated probabilities are then used to calculate the MMSE filter feed-forward and feedback weights in the 
multi-user detection module, as indicated at 44a and 44b, respectively, in Figure 5. The feedback connections 44b rep- 
resent the sub-tractive interference cancellation part of the receiver, while the feed-forward weights 44a serve to sup- 
press any residual interference. 

[0057] The set of equations describing the filter coefficients used for generating the soft decision statistic corre- 
30 sponding to 

35 will be derived. The subscript f is omitted for convenience. Hence, the MMSE estimate 

40 of the /-th antenna symbol at time t is given by 

yM-lttS^r+tt^ (10) 



where 



W 
ML j 



is the m x 1 optimized feed-forward coefficients vector and 



55 b 



is a single coefficient that represents the soft cancellation part. 
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are obtained through minimizing the mean square value of the error 



[| y (0 - C W| 2 ] 



between the data symbol and its estimate. Hence 



21 



t — E yip Z±H>b 



where is the m x 1 complex signature vector of the i th transmit antenna; 

is the m x (n - 1) matrix composed of the complex signature vectors of the other n - 1 transmit antennas 18; 

c l*/0 

is the (n - 1) x 1 transmitted data vector from the other n - 1 transmit antennas 18. Using standard minimization tech- 
niques, it is easily shown that the MMSE solutions for 




and 



satisfy the relations: 



(12) 




where 
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E[M"]=N<>I m *m (14) 
E [a {n/i) \ m qW'K (15) 



w [0058] At this point, statistical independence is assumed once again. This assumption is justified through the differ- 
ent interleaving used by each transmit antenna 18. Then 



E 

15 



20 



£ (»A) £ WO r ] = - Diag (^c^ + '^c^ T (16) 

Here l mxm is the identity matrix of order m; 



is the (n - 1) x 1 vector of the expected values of the transmitted symbols from the other n - 1 antennas. The a-priori 
probabilities used to evaluate these expected values are obtained from the previous decoding iteration soft outputs, 
through the component-wise relation (9). 
25 [0059] To simplify notation, the following definitions are made: 

A = ^$_M H (17) 

B = SW* [/(„-«*(„-!) - Diag {^^) + ^) £ (n/or] S (n/o* (18) 

F = ^("/Ojln/ij (19) 

= N 0 / m * m (20) 

Solving (12) and (13) for the optimum filter feed-forward and feedback coefficients, the following coefficients are 
obtained 



35 



40 



4 ,)T - (A + B + R,.- FF»y l . (21) 



45 



= -m [ p F. (22) 

50 

The log-likelihood ratio is now given by 



55 



LU^Reiwf'z + ^y (23) 
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[0060] In the first decoding iteration, the transmitted symbols are assumed to have a uniform distribution; hence, 

i 

The feed-forward filter coefficients vector, , in this iteration is given by similar relations to MMSE equations derived 
the real domain. The relations of the present invention, however, are in the complex domain because of the complex 
spreading codes, and the feedback coefficient 

10 

= o. 

After each iteration, 

15 

are recalculated using the decoders soft outputs. d n/p * are then used to generate the new set of filter coefficients as 
20 described. In the asymptotic case, when 

25 the receiver is equivalent to the subtractive interference canceler. This is expected, since 




30 means that the previous iteration decisions, for the other antenna symbols, are error free. Under this assumption, the 
subtractive interference canceler becomes the optimum solution. 

[0061] The direct implementation of the receiver 16 employing iterative MMSE requires a complexity of polynomial 
order in the number of transmit antennas 18. Adaptive techniques can be used to reduce implementation complexity. 

35 2.2.3 Iterative Soft Interference cancellation 

[0062] The main source of complexity in the iterative MMSE approach is the matrix inversion operation required to 
compute the filter feed-forward coefficients (21). This observation motivates the following suboptimal approach. can 
be rewitten as 

40 
45 

Then, if the matched filter 

«ir' ,T = K,...,Or;J 

50 

is used, the need for the matrix inversion operation in (??) is eliminated. The resulting receiver has a linear complexity, 
per iteration, in the number of transmit antennas. 

55 2.2.4 Trade-Offs 

[0063] The receiver 16 employing iterative MAP offers a substantial reduction in complexity compared to the opti- 
mal receiver, but its complexity is exponential and could be prohibitive for systems with medium to large numbers of 
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antennas. The receiver 16 employing iterative MMSE has polynomial complexity. The soft interference cancellation 
method is the least complex. 

[0064] The receiver 16 employing iterative MMSE has an important advantage over the other two iterative 
approaches in that it can suppress the interference from other space-time users without the need to decode all of the 
signals. In the iterative MAP and the soft interference cancellation techniques, undecoded signals are treated as white 
Gaussian noise. Hence, both approaches can have a near-far problem from undecoded space-time users. In the itera- 
tive MMSE, the other users' interference can be suppressed by the feed-forward filter coefficients without the need to 
actually decode the other users' signals. Prior knowledge of the other users' spreading codes, that is, path gains, is 
needed; although, an adaptive algorithm based on a combination of iterative cancellation and adaptive subspace pro- 
jection can be used. 

2.3 Application to Layered Space-Time Architectures 

[0065] The iterative multi-user techniques can be implemented for either of the layered or multilayered transmission 
formats described above provided that the transmitter 1 8 is modified so that the output of each channel encoder 20 is 
interleaved independently before transmission. The principal advantages of the iterative techniques in both settings are 
briefly discussed below. In Section 3, a generalized layered architecture with optimized channel coding is presented 
that more effectively exploits the diversity available in the system 10. 

2.3.1 Layered Architecture 

[0066] The iterative techniques of the present invention offer several advantages over the detection technique pro- 
posed in the LST. First, unlike LST, neither iterative approach requires that m>n. Second, the probability of error prop- 
agation is reduced in the presented algorithms through the feedback of soft information instead of hard decisions. More 
importantly, the iterative approach of the present invention strives to suppress interference from other layers with mini- 
mal loss in achieved diversity order. Therefore, these techniques achieve a better performance than LST. 
[0067] Assuming error-free feedback, the capacity of the LST with m = n is 

m 

^B = Elog 2 [l + (p/nkL] 

where p is the signal-to noise ratio at the input of each receive antennas, and %2k are independent chi-squared ran- 
dom variables with 2k degrees of freedom. The lower bound converges at high enough signal-to-noise ratios to the 
actual system capacity, so that LST achieves capacity asymptotically. 

[0068] With the proposed iterative multi-user detection algorithms of the present invention, the ideal performance 
under error-free feedback is close to the upper bound 

Cub =* n log 2 [l + (p/n)*L] ■ (26) 

At low signal-to noise ratios, the difference between the two bounds is considerable, indicating the superiority of the iter- 
ative techniques at small to medium signal-to noise ratios. The simulation results in Section 5 show that the iterative 
MMSE receiver 16 achieves more than 3 dB gain over the LST detection algorithm at 1% frame error rate. 

2.3.2 Multi-Layered Architecture 

[0069] The main advantage of the existing multi-layered scheme over the existing BLAST architecture is the use of 
space-time component codes rather than conventional channel codes. Space-time codes have the advantage of 
exploiting the diversity provided by the multiple transmit antennas but have the disadvantage that the complexity of the 
ML decoder is exponential in the number of transmit antennas used by each component code. The design of space- 
time codes for use in conjunction with the proposed iterative MAP and MMSE detection algorithms, however, is made 
more complicated by the use of independent interleaving of each antenna transmission. Random interleaving applied 
to each antenna stream may reduce the diversity advantage achieved by the space-time code. In the case of the space- 
time trellis codes, it appears a difficult task to verify that an interleaved version would still achieve full spatial diversity 
since the codes are handcrafted. A straightforward method has been proposed for analyzing the original codes and 



(25) 
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demonstrating that they achieve full spatial diversity, but the method is not readily extensible to the interleaved case. No 
systematic method for designing interleaved versions that retain full spatial diversity is known. 

[0070] The iterative MAP and MMSE algorithms of the present invention can be applied, however, to the algebraic 
space-time code designs for BPSK or QPSK modulation. Assume BPSK transmission and a fixed code rate 1/n' where 

5 ri divides n. The input stream is divided into n/n' streams. Each stream is then independently encoded using the nat- 
ural space-time code produced by a rate 1/n' convolutional encoder. The generator polynomials for full spatial diversity 
codes of this type are listed in Table I in Serial No. 09/397,896, for different code constraint lengths and numbers of 
transmit antennas n' 18. Each output arm from each encoder is independently interleaved and transmitted from a dif- 
ferent antenna. In order to ensure that the resulting space-time code retains full spatial diversity, it is enough to verify 

w that the generator matrices corresponding to the interleaved branches of the convolutional code satisfy the stacking 
construction condition in Theorem 3. A random search strategy is very efficient in finding interleavers that satisfy the 
stacking construction. 

[0071] The direct computation of the diversity order d achieved by the soft iterative decoder is a daunting task. Heu- 
ristically, the approximate bounds are as follows: 

15 

n'(m - n + 1) < d < n'm. (27) 



20 

The upper bound corresponds to the maximum diversity achievable by a single space-time component code using n' 
transmit antennas assuming ML decoding in the absence of competing transmissions from the other component codes 
on the remaining antennas (the ideal case). The lower bound corresponds to the diversity achieved by a receiver that 
uses zero-forcing to detect the signal transmitted from its antennas (assuming m ^ n) and ML component decoders. Its 

25 use as an approximate lower bound for the iterative techniques is justified by the fact that the iterative MMSE receiver 
1 6 with a single iteration achieves the same asymptotic performance at high signal-to-noise ratios as the zero-forcing 
receiver and should outperform the zero-forcing receiver at low signal-to-noise ratios since the MMSE criterion seeks 
to maximize total signal-to-noise-and-interference ratio. The bound also applies to the iterative MAP algorithm since it 
outperforms the MMSE technique. 

30 [0072] Guided by the excellent performance of the iterative MMSE receiver in CDMA applications, the achieved 
diversity is close to the upper bound of n'm. The approximate bounds are compared to the 2{m - n + 2) diversity 
advantage achieved by the joint trellis space-time coding and group interference suppression. 
[0073] The architecture just described suffers from two main drawbacks. First, it is not applicable to arbitrary con- 
stellation. This is due to the limited applicability of the binary rank criteria to BPSK and QPSK constellations. Second, 

35 it does not efficiently exploit the temporal diversity embedded in the block fading channel. These limitations are avoided 
in the new approach presented in the next section 

3. The Threaded Space-Time Approach 

40 [0074] In this section, a generic approach for space-time transmitter/receiver design is presented in accordance 
with the present invention which combines efficient algebraic code design with iterative multi-user detection. In the pro- 
posed approach, an input data stream is divided into multiple threads. Each thread is encoded and interleaved sepa- 
rately. At each point of time, only one symbol 60 is transmitted from each thread 62, as shown in Figure 6. At the 
receiver 16, the iterative multi-user detector serves to separate the different threads 62a, 62b, and so on with minimal 

45 loss in performance. The encoding, interleaving, and distribution of thread symbols among different antennas is opti- 
mized to maximize spatial diversity, temporal diversity, and coding gain for a given transmission rate, assuming no inter- 
ference from the other threads. Meanwhile, interleaving is performed in such a way to maximize the efficiency of the 
iterative receiver. While threads can be presented by a diagonal in the matrix 64, as depicted in Figure 6, the symbols 
in a thread need not be transmitted by adjacent antennas in respective symbol transmission intervals. 

50 

3.1 Threaded Space-Time Architecture 

[0075] As in the generic layered architecture, the transmitter has available a disjoint set of layers, 

C~{L U L, I n }, 

and transmits the composite code word 
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1{U) = 7l (Hi) 172(2*2) I ' ' • I7n(2£») 

5 

by sending 

7»(iu) 

10 

in layer L}. 

[0076] The layer set L is designed so that each layer is active during all of the available symbol transmission inter- 
vals and, overtime, uses each of the n antennas equally often. Thus, during each symbol transmission interval, the lay- 
ers each transmit a symbol using a different antenna; and, in terms of antenna usage, all of the layers are equivalent. 
15 A layer satisfying these constraints is referred to as a thread of spatial span n. The simplest example of threaded lay- 
ering of temporal span i is the set C in which 



A»{|t + i-lJ n tU):0<t<{}. (28) 



25 [0077] Unlike the layered architectures of described above, the design approach of the present invention treats the ? 
coded transmission in each layer as a bona fide space-time code, constructions for which are given in the next section. 
Looking at the space-time coding performed on a single layer in isolation, this construction appears to reduce through- 
put as a result of silence periods imposed on the different antennas; however, in the overall threaded transmission 
scheme, the silent periods on antennas that are not used by a given layer are filled with the transmissions from the other 

30 component space-time codes. Signal processing at the receiver, which is necessary to remove or suppress spatial 
interference among the threaded layers, allows high throughput to be achieved. One innovation of the new architecture 
is that, under the assumption of error-free interference cancellation, the component space-time codes can be designed 
to achieve full spatial diversity without degradation in overall system throughput. 

[0078] The space-time architecture of the present invention is not a multi-layer approach since the transmit posi- 
35 tions occupied by the modulated code symbols for a particular thread constitute a single layer. Yet, the architecture of 
the present invention is not a layered architecture in the same sense as the BLAST architecture. This is because the 
threaded layering is a more general type of layering well-suited for iterative multi-user techniques, and the channel cod- 
ing design in the new approach is two-dimensional based on space-time coding principles designed to exploit both the 
spatial and temporal diversity. To distinguish this new approach, the architecture of the present invention is referred to 
40 as the threaded space-time (TST) architecture. The three architectures are compared in more detail in Section 4 below. 
[0079] The efficiency of the threaded architecture depends on the ability of the receiver to eliminate the interference 
coming from the other space-time component codes. In principle, any mufti-user detection technique can be used in this 
context. The iterative MMSE receiver is used in accordance with the present invention because of its reasonable com- 
plexity and its ability to achieve performance close to the interference-free scenario under different conditions. This 
45 approach is applicable to arbitrary constellations with binary (or non-binary) codes. 

3.2 Design of Threaded Space-Time Codes 

[0080] In this section, the design of the component space-time codes used in the threaded architecture is dis- 
50 cussed. The design of these codes follows an algebraic approach introduced in the above-referenced application Serial 
No. 09/397,896. The layering provided by the threaded architecture allows the algebraic formulation to be extended to 
arbitrary signalling constellations. Importantly, the requirement for independent interleaving in the iterative multi-user 
receiver is easily accommodated in these code designs. 
[0081] Consider a single threaded layer 

55 

In 
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and the corresponding component space-time code C f associated with encoder 

5 

The spatially modulated code words of Cy are the n x (N/b) complex matrices 



ft (•*(*)>■ 

w 

To simplify notation, the indices are not used, letting 

15 

and 



0 = V 

20 

Letf^ denote the component spatial modulator function associated with layer L. Unsubscripted vectors such as x ory 
are used herein to refer to the information stream 

[0082J For the design of the space-time code C associated with thread L, the following stacking construction using 
binary matrices for the quasi-static fading channel is employed. 
25 [0083) Theorem 4 (Threaded Stacking Construction) Let L be a threaded layer of spatial span n. Given binary 
matrices M 2 ,..., M n of dimension k x b£ t let C be the binary code of dimension k consisting of all code words of the 
form 

30 

where x denotes an arbitrary k-tuple of information bits Let \ L denote the spatial modulator having the property that 
is transmitted in the t symbol intervals of L that are assigned to antenna i. 

[0084] Then, as the space-time code in a communication system with n transmit antennas and m receive antennas, 
the space-time code C consisting of C and f L achieves spatial diversity dm in a quasi-static fading channel if and only 
40 if d is the largest integer such that M 1( M 2 ,..., M„ have the property that 



Vaj,02,. ,a„ € F,a ; +a2 + • +a w =* n + 1 : 

M [aiM^M* • • * OnM„] is of rank k over the binary field. 



50 

[0085] Proof: Due to the lack of spatial interference within a layer, the baseband rank criterion is straightforward to 
apply. In particular, note that the baseband difference f L (g(x)) - f L(9(y)) has rank d if and only if it has precisely d non- 
zero rows. 

[0086] Now suppose that, for some a 1t a 2 ,..., a n e F satisfying a^ + a 2 + • • • + a n = n - d + *\ , then 

55 



M = [ajMia 2 M 2 • * • a n M*J 
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is singular. Then, there exist x, ye F*, y, such that 



5 

In this case, i L (g(x)) - f^teM) has an all-zero row for every non-zero coefficient 

w 

Since there are n - d + 1 non-zero coefficients, f x(fif(x)) - f/_(0(y)) has rank less than d. Thus, C does not achieve dm- 
level diversity. 

[0087] Conversely, suppose C does not achieve dm-level diversity. Then, there exist x, ye F k ,x*y such that the 
baseband difference i L (g(x)) - f L (g(y)) has rank less than d. It must therefore have at least n - d + 1 all-zero rows. Let 
15 I denote a set of indices for n - d + 1 such rows, and set 

0, - 1 



20 for/'e / and 

a, = 0 

25 otherwise. Then, the matrix 
is singular since 



[0088] Corollary 5 Full spatial diversity nm is achieved if and only if M 1 , M 2 , . .., M n are of rank k over the binary fieid. 
[0089] A space-time code that achieves dm-level spatial diversity in a communication system with n transmit and 
40 m receive antennas over the quasi-static fading channel is called a d-space-time code 

[0090] Corollary 6 The maximum transmission rate for a communication system using the threaded layering archh 
tecture with n transmit antennas, a signaling constellation of size 2 b , and component codes achieving d-level transmit 
spatial diversity constellation is b(n • d + 1) bits/sec/Hz. 

[0091] Proof: By Theorem 4, in order for the code to achieve d-level spatial diversity, the number of columns in My 
45 must satisfy bt > k/(n - d + 1) . Then the code rate for C is kl(nbl) < (n - d + 1)/n . Therefore, the maximum trans- 
mission rate of each thread is br < b(n - d + 1)/n bits per signaling interval. Then, the total transmission rate of the n 
threads is b(n - d + 1) . A different proof can be obtained using the maximum lossless compression transmission rate. 
[0092] The following result is facilitates the design of space-time threaded codes that allow for exploiting the tem- 
poral diversity and maximizing the efficiency of the iterative multi-user detector. 
so [0093] Theorem 7 Let Cbea d-space-time code consisting of the binary code C whose code words are of the form 

g(z) =* iMi |zM 2 1 • • • IzMn, 

55 where x denotes an arbitrary k-tuple of information bits, and the spatial modulator f L in which 
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iM, 

5 is assigned to antenna i along threaded layer L Given the linear vector-space transformations 

T| TnlP'-fP, 

7 o a new space-time code is constructed by assigning 7/ (*M/) to antenna i along threaded layer L . Then, the new space 
time code achieves the same spatial diversity order dm if 7~ 1( ..., T n are nonsingular. 
[0094] In particular, the linear transformation 7) of the previous theorem is an arbitrary permutation 

7T,. 

15 

Then, the interleaved space-time code resulting from assigning 

to antenna / along threaded layer L achieves the same level of diversity as the non-interleaved space-time code C. 
[0095] Consider the special case of designing space-time trellis codes for the threaded architecture. The natural 
space-time codes discussed in application Serial No. 09/397,896 and associated with binary, rate 1/n, convolutional 

25 codes with periodic bit interleaving are attractive candidates for the threaded space-time architecture as they can be 
easily formatted to satisfy the threaded stacking construction. Each output arm from the encoder is transmitted from a 
separate antenna. There is no restriction on the interleaving employed by each antenna (i.e, different interleaving can 
be used by the different antennas without violating the generalized stacking condition). As discussed earlier, this feature 
allows for the design of efficient iterative multi-user receivers. These convolutional codes can be used for a similar appli- 

30 cation, that is, the block erasure channel. The main advantage of such codes is the availability of computationally effi- 
cient, soft-input/soft-output decoding algorithms. 

[0096] For space-time trellis codes treats, only the case in which the underlying code has rate 1/n matched to the 
number of transmit antennas has been considered. For the threaded space-time code design of the present invention, 
the more general case in which the convolutional code has rate greater than 1/n is used. The treatment includes the 
35 case of rate kin convolutional codes constructed by puncturing an underlying rate 1/n convolutional code. 

[0097] Let C be a binary convolutional code of rate kin. The encoder processes k binary input sequences x^r), 
*2(0»-. *A<0 and produces n coded output sequences y 1 (f), /2(0.-. Xn(0 whlch are multiplexed together to form the 
output code word. 

[0098] For quasi-static fading channels, the input and output sequences of interest are of fixed finite length. In the 

40 more general case, however, the sequences are semi-infinite indexed by f = 0, 1 , 2 Let F°° denote the space of all 

such binary sequences. A sequence {x(f)}£ 0 e F°° is often represented by the formal series 
X(D) = x(0) + x(1)D + x(2)D 2 + • • • . Refer to {x(f)} <-> X(D) as a D-transform pair. The space F[[D]] of all formal 
series is an integral domain whose invertible elements are those that are not multiples of D. 

[0099] The action of the binary convolutional encoder is linear and is characterized by the so-called impulse 
45 responses 

so associating ouput yft) with input x/t). Specifically, 

vAt) = Xj(t) * 5lJ (t) + x 2 {t) * fej(t) + • • • + x k (t) *9kj{t), 

55 

where * denotes discrete convolution. Then the D-transform of the y-th output of the convolutional encoder is given by 
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Y 3 (D) = X^GuiD) + X 2 (D)G 2j (D) + • • • + X k (D)G kJ (D). 



Thus, the encoder action is summarized by the matrix equation 



Y(£) = X(D)G(£)), 



10 



where 



75 



Y(D) = [ Y x[D ) Y 2 (D) ... y w (0)],X(D) «[*,(!>) *,(j>) ]. 



and 



20 



25 



GiaW GUD) GxAD) 
G 2l i(D) C 2>2 (D) ..- G 2 , n (D) 

• ► i 



30 



[0100] Consider the natural space-time formatting of C in which the output sequence corresponding to Y } {D) is 
assigned to the y'-th transmit antenna. Characterize the spatial diversity that can be achieved by this scheme. A pre- 
ferred algebraic analysis technique considers the rank of matrices formed by concatenating the column vectors 



35 



40 



GxAO) 
G n /(2>) . 



Specifically, for a 1f a 2 ,..., a n e fF, let 



45 



P(«i»aa»'".On)-[aiFi a 2 F 2 a n F» ] - 



Then the following theorem relating the spatial diversity of the space-time code C in the quasi-static fading channel to 
the rank of these matrices over F[[D]] is considered. 

[0101] Theorem 8 Let C denote the threaded space-time code consisting of the binary convolutional code C, 
whose kxn transfer function matrix is 



55 



0(D) « [ F,(D) F 2 (D) ... F„(D) j, 
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and the spatial modulator f L in which the output 

r 3 (D) = X(D) ■ F,(D) 

is assigned to antenna j along threaded layer L Let v be the smallest integer having the property that, whenever 
a 1 + a 2 + • •+a n =v,thekxn matrix F(ay, a 2 , • • • , a n ) has full rank k over F[[D]]. Then the space-time code C 
achieves d-level spatial transmit diversity over the quasi-static fading channel where d = n - v + 1 and v>k. 
[0102] Proof: All of the code words of C are of the form 

Y*{D) = G T (D)X T (Z>). 

15 Under the stipulated conditions of the theorem and following the argument of Theorem ?? (threaded stacking construc- 
tion), only the all-zero code word has v or more all-zero rows, so the spatial transmit diversity of C is at least n - v + 1 . 
On the other hand, since v is the smallest integer having the stated property, there is some information sequence X(D) 
resulting in a code word with v - 1 all-zero rows. Hence, the spatial transmit diversity of C is precisely n - v + 1 . 
[0103] Rate 1/r?' convolutional codes with n'<n can also be put into this framework. Let C be a binary convolutional 

20 code with transfer function matrix 



10 



25 



30 



45 



50 



55 



G(D) = [ G Q (D) C x (D) ■■■ G n ,. x (D)] 



The coded bits are to be distributed among n transmit antennas. For simplicity, consider the case in which s = nln' is 
an integer and the coded bits are assigned to the antennas periodically. Thus, for each of the coded bit streams Yj(D) 
<-> MO). tne subsequence y/0), y/s), y/2s),... is assigned to antenna si; the subsequence 

tt(l),|h(« + l),lfc(2« + l)»--. 



is assigned to antenna si + 1; and so on. Alternate assignments such as symbol based demultiplexing would also be 
35 possible and can be analyzed using the same framework. 

[0104] In general, partition the series X(D) corresponding to {x(t)) into its modulo s components X } {D) correspond- 
ing to the subsequences {x(st + /)) T=o 0 = 0, 1 , 2,..., s - 1 ). Then 

X(D) = X 0 (D') + D • Xi(D') + • • • + D" 1 - X,- X (D>) 

Similarly, partition G/O) into components 



and 



into components 



The space-time code C under consideration therefore consists of the binary code C together with a spatial modulator 



function in which 
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is assigned to antenna si + j. 

[0105] By multiplying the expansions for X(D) and 



w 



G t (D) 



15 



and collecting terms, it can be shown that the coded bit stream assigned to antenna si + j is given by 

r w (i>)-£W)WU>>. 



where 



20 



25 In matrix form, 



30 



which is the dot product of row vector 



35 



X(D) = [x 0 (D) Xi(D) ... AT._ X (Z?)] 



and column vector 



45 



^,o(i>) 

. JW-»(0) 



[0106] The theorem now applies directly. The spatial transmit diversity achieved by C is given by d = n - v + 1 , 
where v is the smallest integer having the property that, whenever a 0 + a,+ • • • + a n . 1 = v,thesx/? matrix F(a 0 , 
aii • • • i a n .-j) has full rank s. In particular, the best possible spatial transmit diversity is d = n - s + 1 . When n' = n, 
s = 1 so that full spatial transmit diversity d - n is possible as expected. 
55 [0107] Example. Consider the optimal d free = 5 convolutional code with generators G 0 (D) = 1 + D 2 and 
G^(D) = 1 + D + D 2 . In the case of two transmit antennas, it is clear that the natural threaded space-time code 
achieves d = 2 level diversity. 

[0108] In the case of four transmit antennas, note that the rate 1/2 code can be written as a rate 2/4 convolutional 
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code with generator matrix: 



G(D) = 



l + £ 0 1 + Z> 1 
0 l + D D l + D 



By inspection, every pair of columns is linearly independent over F[[D]]. Hence, the natural periodic distribution of the 
J0 code across four transmit antennas produces a threaded space-time code achieving the maximum d = 3 transmit spa- 
tial diversity. 

[0109] For six transmit antennas, express the code as a rate 3/6 code with generator matrix; 

10 1111 
D 1 0 D 1 1 . 
0 D 1 D D 1 



15 

G(D) = 



Every set of three columns in the generator matrix has full rank over F[[D]], so the natural space-time code achieves 
maximum d = 4 transmit diversity. 

[0110] Thus far, the design of threaded space-time codes that exploit the spatial diversity over quasi-static fading 
25 channels has been considered. One of the advantages of the threaded architecture, however, is its ability to jointly 
exploit the spatial diversity, provided by the multiple transmit and receive antennas, and the temporal diversity, provided 
by the time variations in the block fading channel. In fact, the results obtained for threaded space-time code design for 
the quasi-static fading channel are easily extended to the more general block fading channel. 
[0111] In the absence of interference from other threads, the quasi-static fading channel under consideration can 
30 be viewed as a block fading channel with receive diversity, where each fading block is represented by a different 
antenna. For the threaded architecture with n transmit antennas and a quasi-static fading channel, there are n inde- 
pendent and non-interfering fading links per code word that can be exploited for transmit diversity by proper code 
design. In the case of the block fading channel, there is a total of nB such links, where B is the number of independent 
fading blocks per code word per antenna. Thus, the problem of block fading code design for the threaded architecture 
35 is addressed by simply replacing parameter n by nB. 

[0112] For example, the following "multi-stacking construction" is a direct generalization of Theorem 4 to the case 
of a block fading channel. 

[01 1 3] Theorem 9 (Threaded Multi-Stacking Construction) Let L be a threaded layer of spatial span n. Given 
binary matrices M 1 1p M 2| 1,..., M nll ..., M 1fll M 2fi ,..., M n B of dimension k x /, let C be the binary code of dimension k 
40 consisting of all code words of the form 



where x denotes an arbitrary k-tuple of information bits, and B is the number of independent fading blocks spanning one 
code word. Let 1 L denote the spatial modulator having the property that fx (jdAj v ) is transmitted in the symbol intervals 
of L that are assigned to antenna j in the fading block v. 
so [011 4] Then, as the space-time code in a communication system with n transmit antennas and m receive antennas, 
the space-time code C consisting of C and f L achieves spatial diversity dm in a B-block fading channel if and only if d 
is the largest integer such that M 1t1 , M 21 ,..., M n B have the have the property that 
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Va u> a 2 ,i, . . . ,a„,B € f t a XtX + a 2i \ + - • - + a^* ■> n£ - rf+ 1 : 
5 M « (a lt iMx > i^2,iM2,i • • ■ a Uf BM 7itB ] is of rank k over the binary field. 

w [0115] Proof: This result is immediate from the equivalent quasi-static model with nB transmit antennas. 
3.3 Performance Bound 

[011 6] In this section, the diversity advantage achieved by the threaded architecture when the iterative MMSE algo- 
15 rithm is used is investigated. 

[0117] Proposition 10 Let C be a d-diversity code used in each thread in a setting with n transmit and m receive 
antenna, then the zero-forcing receiver achieves spatial diversity d' = d • (m - n + 1 ) 

[011 8] Proof: To detect the signal transmitted from the n-th antenna, the zero forcing receiver projects the received 
signal on the null space of S^ n/I \ Let ,V/ be the null space of 

20 

25 and 

v, 

30 be a (m - n + 1) x m matrix whose rows are orthonormal vectors of 

M, 

35 then the (m - n + 1) x 1 output vector corresponding to is computed as 

40 

The elements of 

are independent Gaussian random variables with 
Note that, in general 

*tt w 5 w *> * o. 

Hence, at the output of the zero forcing filter, the channel is equivalent to an interference-free correlated block fading 
channel with n blocks and m - n + 1 receive antennas. Since the different equivalent Gaussian fading gains are line- 



(29) 
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10 



15 



35 



40 



arly independent, the channel correlation matrix is of full rank. Thus, the diversity order is d • (m - n + 1) . 
[0119] Let 

SIR® 

denote the signal-to-interference-plus-noise ratio (SIR) for a symbol transmitted from the /-th antenna after the y-th iter- 
ation of the iterative MMSE algorithm. Then, conditioning on the set of path gains, 

3 ^WP+s^ftW^ipa-^)' 1 ) 



where jv y is the vector of feed-forward filter coefficients used in the y-th iteration. 

[0120] Proposition 11 Let Cbe a d-diversity code used in each thread in a setting with n transmit and m receive 
antenna. The SIR at the output of the iterative MMSE detector after j iterations is at least as large as the SIR after one 
iteration. Furthermore, output SIR is at least as large as that produced by the zero-forcing detector. 
20 [0121] Proof: If 

25 denotes the SIR at the output of the zero-forcing detector, then it follows from the definition of the MMSE receiver that 

SNR? > SN&4- 

30 Also, from the definition of the MMSE filter, it follows that 



" Afeita II 2 + ^Ilmf^ll 2 (i - awj 

> EM^Ht 

' NollimlP + E^ £.\\ag&*\\* 

= SIR?, 



as was to be shown. 

[0122] The output of the MMSE receiver can be tightly approximated by a Gaussian random variable in additive 
white Gaussian noise (AWGN) channels. In the space-time code setting, the channel is AWGN when conditioned an 
45 the path gains. Thus, the diversity advantage achieved by the iterative MMSE receiver for the threaded architecture is 
approximately lower bounded by the performance achieved by the zero-forcing receiver. Consequently, in a threaded 
architecture using d-space-time codes, the iterative MMSE receiver can achieve diversity d' satisfying 

50 d*(jn-n + l)<d < dm. (31) 



This lower bound justifies the approach to code design for the threaded architecture in accordance with the present 
55 invention. In particular, the design criteria developed in Theorems 4 and 9 for optimizing the channel coding for each 
thread in the absence of interference also serves to maximize a lower bound on the diversity advantage when the iter- 
ative MMSE detector is used to mitigate the interference from other threads. The simulation results of Section 5 suggest 
that the lower bound is in fact a pessimistic estimate of the performance of the threaded architecture with iterative 
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MMSE multi-user detection. 
4. System Comparisons 

5 [0123] A high-level comparison, of the various architectures is shown in Table I below. As shown in the table, all of 
the transmission formats achieve comparable efficiency. Here, efficiency refers to the number of information symbols 
per vector channel use. For example, in the horizontal layering scheme, there are n layers each containing a code word 
of length lb and rate r. Thus, successful use of all transmission resources provides a total of n • (rib) information sym- 
bols. Normalizing by the total number of symbol transmission intervals I gives an efficiency of nrb information symbols 

10 per transmitted symbol interval. For the diagonal-layering, the efficiency is somewhat less since the diagonal layers 
cannot utilize a portion of the transmission resources (the result in the table assumes the width of the diagonal w = 1 ). 
[0124] The diversity orders achieved by time various architectures in both quasi-static and block fading channels 
are also indicated in Table I. In the different approaches, the channel coding schemes are assumed to achieve the max- 
imum possible diversity level for rate r codes. Since no attempt has been made to optimize the coding for the diagonal 

75 layering architecture, the results reported in the table are on a per-symbol basis. In the prior architectures, the diversity 
order is variable. Table I shows the range of values (minimum value : maximum value) and notes whether the variation 
is from layer to layer or from symbol to symbol. In the case of the threaded architecture, the diversity order is not variable 
in this way. Since the exact value is unknown, Table I gives upper and lower bounds. For the block fading channel, the 
parameter B denotes the number of fading blocks per code word. 

20 [0125] The threaded layering is similar to V-BLAST in that each transmitted symbol in a thread is subject to inter- 
ference from n - 1 other layers, but better spatial diversity is achieved through more efficient transmit diversity and multi- 
user detection signal processing. The threaded layering is similar to D-BLAST in that all of the transmit antennas are 
used equally by each component coded transmission. Threaded layering, however, more fully exploits the available 
temporal diversity since temporal interleaving is allowed across each transmit antenna. Furthermore, unlike D-BLAST, 

25 the threaded layering with space-time code design and iterative multi-user detection algorithms provide uniform spatial 
diversity from symbol to symbol. Unlike the horizontal multi- layering approach with group interference suppression, the 
threaded architecture provides uniform performance from one component space-time code to the next. Each compo- 
nent space-time code therefore can, under the ideal Interference cancellation assumption, achieve full spatial and tem- 
poral diversity. 

30 

Table I. Comparison of Different Layered Architectures 
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5. Performance Comparisons 

[0126] In this section, the performance of iterative multi-user detection applied to layered space-time architectures 
is investigated by means of simulation. The first study considers the conventional layered architecture and demon- 
5 strates the advantage of the iterative methods of the present invention over existing zero-forcing techniques. The sec- 
ond study looks at the threaded space-time architecture, in which the layering and channel coding are optimized for 
iterative multi-user detection, and demonstrates that significant performance improvements can be achieved by this 
new approach. 

[0127] Throughout the simulation study rate 1/2 convolutional codes are used. The results are obtained by averag- 
10 ing the bit and frame error rates of all the component codes. The channel decoder is based on the soft output Viterbi 

algorithm (SOVA). The frame length is 100 bits. The iterative MMSE receiver is considered in more detail because it 

provides a compromise between complexity and performance among the three presented iterative receivers. 

[0128] While, in essence no restrictions were imposed on the number of receive antennas, the performance of the 

threaded space-time architecture can be expected to degrade if m « n. The reason is that in practice a reasonably 
15 large number of receive antennas is required to remove the interference. In the next section, it is shown that excellent 

performance can be achieved for the case of m - n, showing substantial gain over the group suppression multi-layering 

approach. 

5.1 Layered Space-Time Architecture 

20 

[0129] Figure 7 illustrates the performance of the iterative MMSE receiver with single-dimensional channel codes. 
The number of transmit and receive antennas is 4 and the bandwidth efficiency of this system is 2 bits/sec/hz (i.e. BPSK 
modulation). For comparison purposes, a lower bound on the frame error rate of the LST approach is included, as well 
as the performance of the single user systems which assumes the presence of only one transmit antenna and 4 receive 
25 antennas. The LST lower bound assumes error-free decision feedback. The other bound is a lower bound on the per- 
formance achieved by the optimum receiver. 

[0130] It can be seen that the performance of the proposed iterative MMSE receiver 40 is within a fraction of a dec- 
ibel of the interference-free performance. This is 2 dB better than the best possible performance of LST. In fact, while 
the performance of the LST is expected to be close to the lower bound at high signal-to noise ratios, the bound is 
30 expected to be loose at low signal-to-noise ratio due to error propagation. Hence, the relative advantage of the iterative 
MMSE receiver is larger. 

[0131] The combination of space-time codes with iterative decoding techniques can be even more potent. In Figure 
8, the performance of the combined space-time coding and iterative MMSE reception scheme (Iterative MMSE + STC) 
described above is illustrated where the output of each component convolutional code is distributed among two anten- 

35 nas. It is shown that this scheme provides a gain of about 1 dB over the iterative MMSE receiver combined with single 
dimensional coding. It is quite remarkable that this gain comes without any additional complexity at either the transmit- 
ter or the receiver. Overall, the gain provided by the combined architecture compared with the LST is more than 3 dB in 
this scenario. This gain increases with the number of antennas because of the superior ability of the present invention 
to exploit the diversity advantage provided by the multiple transmit and receive antennas. 

40 [0132] Figure 8 illustrates another advantage of the Iterative MMSE + STC scheme. In this figure, the case with 4 
transmit and 2 receive antennas is considered As discussed earlier, the LST receiver architecture cannot be used in 
tins scenario because m <n This system is equivalent to space-time multiple access channel with 2 space-time users. 
The two space-time users have equal energy and use two antennas for their transmission. It was been shown that with 
2 receive antennas, the performance of this system is same as a space-time multiple access channel with one space- 

45 time user and one receive antenna using the interference cancellation technique (i.e, the second receive antenna is 
used to remove the interference of the other space-time user). To compare this result with the scheme of the present 
invention, the performance of 2 transmit/1 receive and 2 transmit/2 receive antennas which transmit 1 bit/sec/hz (i.e, 
equivalent to one space-time user) was also depicted in the Figure. It is shown that by jointly decoding the two space- 
time users, using the iterative MMSE technique, a 3 - 4 dB gain is achieved over the blind interference cancellation 

so scheme which was shown to be equivalent to the 2 transmit/1 receive scenario. 

5.2 Threaded Space-Time Architecture 

[0133] In this section, the performance of the threaded space-time (TST) architecture of the present invention and 
55 the combined group interference suppression and space-time coding (TNSC) architecture are compared. 

[0134] Figures 9, 10 and 1 1 compare the performance of the two schemes for the case of 4 transmit/4 receive and 
8 transmit/8 receive antennas, respectively. In the TST architecture, periodic bit interleaving was used to distribute the 
symbols. QPSK modulation with Gray mapping was used to map the binary input to each antenna to a complex con- 
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stellation. Hence, the spectral efficiency is 4 bits/sec/hz and 8 bits/sec/hz, respectively. From the figures, the significant 
gain provided by the TST over the TNSC scheme is clear. Indeed, the TST approach shows a gain of 3-7 dB over the 
TNSC scheme. The TST results are within 2-3 dB of the outage capacity. 

[0135] Two main reasons contribute to this advantage. First, the ability of the iterative MMSE receiver 40 of the 
5 present invention to eliminate the interference with only a minor loss in the diversity advantage. Second, the generalized 
stacking construction coupled with the use of convolutional codes with the best minimum distance provide a better cod- 
ing advantage over the hand-crafted trellis code used by the TNSC architecture. Note also that 8 state codes are used 
while, in the TNSC architecture, a 32 states codes were used. The gain in diversity advantage achieved by the TST 
architecture can be seen in the steeper asymptotic slope of the performance curve. It is also shown that the gain pro- 
70 vided by the TST increases with the number of antennas, again due to the better exploitation of the diversity in the sys- 
tem. 

6. Conclusions 

15 [0136] In accordance with the present invention, the design problem for multiple antenna systems operating over 
the fading channel is addressed. The problem was addressed herein from both a signal processing and a space-time 
coding perspective. From the signal processing side, a set of iterative algorithms for joint decoding and detection are 
presented that provide a trade-off between performance and complexity. Simulation results are provided for the iterative 
MMSE receiver, establishing its ability to approach the interference-free performance lower bound within a fraction of a 

20 dB. From the space-time coding perspective, the BPSK modulation scenario was considered. Under this assumption, 
convolutional based, space-time codes were presented that exploit the spatial diversity provided by both the transmit 
and receive antennas. The new scheme avoids some of the limitations of the layered space-time architecture. Then, the 
more general case of arbitrary non-zero complex constellation was considered and a new scheme, the threaded space- 
time architecture, as presented in accordance with the present invention. This new approach was shown through sim- 

25 ulation to achieve a significant gain over combined array processing and space-time coding. 

[0137] As a final remark, in the absence of interference from other threads, the fading channel is equivalent to the 
block fading channel with receive diversity, where each fading block is represented by a different antenna. The algebraic 
framework developed for threaded space-time code design is therefore also useful in the study of code design for block 
fading channels and is applicable to both block and trellis-based codes. 

30 [0138] Although the present invention has been described with reference to a preferred embodiment thereof, it will 
be understood that the invention is not limited to the details thereof. Various modifications and substitutions have been 
suggested in the foregoing description, and others will occur to those of ordinary skill in the art. All such substitutions 
are intended to be embraced within the scope of the invention as defined in the appended claims. 

35 Claims 

1 . A method of transmitting symbols (12) in a multi-user wireless communication system (10) have a plurality of trans- 
mit antennas (18) and a plurality of receive antennas (24), the method comprising the steps of. 

40 dividing a data stream into multiple threads (62), each of said threads comprising said symbols (60); and 

transmitting one of said symbols (60) from each of said threads (62) from respective ones of said plurality of 
transmit antennas (18) during a symbol transmission interval. 

45 2. The method of claim 1 , further comprising the step of encoding (20) each of said threads. 

3. The method of claim 2, wherein each of said threads (62) corresponds to a code word comprising a selected 
number of said symbols (60). 

so 4. The method of claim 3, wherein said encoding step (20) comprises the step of employing space-time codes for 
transmitting said symbols (60). 

5. The method of claim 1 , further comprising the step of interleaving each of said threads (62). 

55 6. The method of claim 1 , wherein said transmitting step comprises the step of transmitting respective said symbols 
(60) from each of said threads (62) during respective symbols periods to generate a code matrix having rows and 
columns, said rows and said columns each comprising a plurality of elements, said elements in one of said rows 
and columns corresponding to respective ones of said plurality of transmit antennas (1 8) and said elements in the 
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other one of said rows and columns corresponding to respective ones of said symbols in said threads occurring in 
the same symbol transmission interval. 

7. The method of claim 6, wherein said symbols (60) for one of said threads (62) being allocated among consecutive 
5 ones of said elements in said code matrix that correspond to said plurality of transmit antennas (1 8) during consec- 
utive said symbols periods to represent a diagonal line of said elements in said code matrix. 

8. The method of claim 6, wherein each of said elements in said code matrix corresponds to one of said symbols (60) 
and is not zero. 

70 

9. An apparatus (14) for transmitting symbols in a multi-user wireless communication system (1 0) comprising: 

a plurality of transmit antennas (18); and 

15 a processing device operable to divide a data stream into multiple threads (62), each of said threads compris- 

ing said symbols (60), and to transmit one of said symbols from each of said threads from respective ones of 
said plurality of transmitter antennas during a symbol transmission interval. 

10. The apparatus of claim 9, wherein said processing device is further operable to encode each of said threads (62). 

20 

11. The apparatus of claim 10, wherein each of said threads (62) corresponds to a code word comprising a selected 
number of said symbols 

12. The apparatus of claim 11, wherein said processing device employs space-time codes for transmitting said sym- 
25 bols (60). 

13. The apparatus 9, wherein said processing device is further operable to interleave each of said threads (62). 

14. The apparatus of claim 9, wherein said processing device is operable to transmit respective said symbols (60) from 
30 each of said threads (62) during respective symbols periods to generate a code matrix having rows and columns, 

said rows and said columns each comprising a plurality of elements, said elements in one of said rows and columns 
corresponding to respective ones of said plurality of transmit antennas (1 8) and said elements in the other one of 
said rows and columns corresponding to respective ones of said symbols in said threads occurring in the same 
symbol transmission interval. 

35 

15. The apparatus of claim 14, wherein said processing device is operable to allocate said symbols (60 for one of said 
threads (62) among consecutive ones of said elements in said code matrix that correspond to said plurality of trans- 
mit antennas (18) during consecutive said symbols periods to represent a diagonal line of said elements in said 
code matrix. 

40 

16. The apparatus of claim 14, wherein said processing device is programmable to provide one of said symbols (60) in 
each of said elements such that none of said elements is inactive. 

17. The apparatus of claim 14, wherein said a multi-user wireless communication system (10) comprises a plurality of 
45 receive antennas (24), said plurality of transmit antennas (24) need not be equal in number to said plurality of 

receive antennas. 

18. A method of multi-user detection of symbols transmitted using space-time codes from a plurality of transmit anten- 
nas (18) to a plurality of receive antennas (24) that can be subject to spatial interference, the method comprising 

so the steps of: 

receiving streams from said plurality of transmit antennas (18); 

generating estimates of said streams using a soft input/soft output detector (44); 

55 

decoding respective said streams using corresponding soft input/soft output decoders (48): and 

refining processing by said soft input/soft output detector using soft outputs generated by said soft input/soft 
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output decoders. 

19. A method as claimed in claim 18, further comprising the step of interleaving (50) each of said streams. 

5 20. A method as claimed in claim 1 9, further comprising the steps of: 

deinterleaving (46) said streams prior to said decoding step; and 
interleaving (50) said streams prior to said refining step 

w 21 . A method as claimed in claim 1 8, wherein said decoding step comprises the step of employing a decoding scheme 
selected from the group consisting of a maximum a-posteriori scheme, a log-domain maximum a-posteriori 
scheme, and a soft input Viterbi scheme. 

22. A method as claimed in claim 18, wherein said streams comprises symbols, said refining step comprising the step 
15 of computing maximum a posteriori probabilites for said symbols. 

23. A method as claimed in claim 22, further comprising the steps of: 

updating said maximum a posteriori probabilites for said symbols after each said decoding step; and 



20 



25 



30 



35 



45 



50 



determining minimum mean square error filter feed-forward and feedback coefficents to recalculate said esti- 
mates of said streams. 

24. A method as claimed in claim 23, wherein said determining step further comprises the steps of: 
defining an MMSE estimate of the /-th antenna symbol at time t as 

y w =2uJ ;T r + ^° (32) 



where 



is a m x 1 optimized feed-forward coefficients vector, w$ is a single coefficient that represents a soft cancel- 
40 lation, and 



corresponds to minimization of the mean square value of the error 

e= =£|j y li>_ c (<)| 2 ' 

between one of said symbols and the corresponding said estimate thereof such that 
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c = £ 
= £ 



■utf {^c^ + S^W* + n} + - M 



(33) 



70 



75 
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30 



where 

5(0 

is a m x 1 complex signature vector of the f h one of said transmitter antennas, 



is a m x (n - 1) matrix composed of complex signature vectors of the other n - 1 said transmitter antennas, 
and 



corresponds to a (n - 1 ) x 1 transmitted data vector from the other n - 1 said transmit antennas; 
defining relations satisfied by minimum mean square error solutions for 



35 



40 



and 



such that 



45 



50 



(34) 



where 
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B[w"] -JVo/m,m (36) 



solving said relations for optimum filter coefficients 

= (A + B + R, t ^ FF")' 1 . (38) 
= -mfF (39) 

such that the log-likelihood ratio is 

t<'> = ^(t W w T r + tuJ ,) ) (40) 

wherein 

£ = [/(n-i)* ( .-i) - Oiag (^gW) f g<»">£W>r) 5 W (42) 

F = S lrt/0 £ (u/,) (43) 
A. = N D I mxm . (44) 
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